Take-home Exercise 3

Interactivity in Visual Analytics

Antonius Handy https://www.linkedin.com/in/antoniushandy (Singapore Management University, Master of IT in Business)https://scis.smu.edu.sg/master-it-business
2022-05-15

1. Overview

In this take-home exercise 3, we are going to reveal the economic of the city of Engagement, Ohio USA by using appropriate static and interactive statistical graphics methods. We will show how the financial health of the residents changes over the period of time and how the wages are compared against overall cost of living.

The data is processed by using appropriate tidyverse family of packages, whereas the statistical graphics are prepared using ggplot2 and its extensions.

2. Getting Started

Before we get started, it is important for us to ensure that the required R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.

The packages required for this exercise are tidyverse, ggiraph, plotly and gganimate, zoo and [trelliscopejs](https://hafen.github.io/trelliscopejs/index.html.

packages = c('tidyverse','ggiraph','plotly','gganimate','zoo','trelliscopejs')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

3. Data

3.1 Data Source

The original dataset was obtained from VAST Challenge 2022 in csv format. It shows the financial report of the residents of Engagement, OH that have agreed to participate in this study from March 2022 until May 2023.

3.2 Importing Data

The code chunk below imports a dataset called FinancialJournal.csv into R by using read_csv() of readr and saves it as tibble data frame called financial. It consists of 1,856,330 records as shown below.

financial <- read_csv("data/FinancialJournal.csv")
head(financial)
# A tibble: 6 x 4
  participantId timestamp           category  amount
          <dbl> <dttm>              <chr>      <dbl>
1             0 2022-03-01 00:00:00 Wage      2473. 
2             0 2022-03-01 00:00:00 Shelter   -555. 
3             0 2022-03-01 00:00:00 Education  -38.0
4             1 2022-03-01 00:00:00 Wage      2047. 
5             1 2022-03-01 00:00:00 Shelter   -555. 
6             1 2022-03-01 00:00:00 Education  -38.0

3.3 Data Wrangling

As we are not interested in time, we will only pick up the month and year by using as.yearmon() of zoo.

The financial amounts are allocated into 6 categories namely Education, Food, Recreation, Rent Adjustment, Shelter and Wage. We are going to sum them up and group them by participant ID, yearmonth and category.

financial <- financial %>%
  mutate(yearmonth = as.yearmon(timestamp)) %>%
  group_by(participantId, yearmonth, category) %>%
  summarise(Total = sum(amount))
head(financial)
# A tibble: 6 x 4
# Groups:   participantId, yearmonth [2]
  participantId yearmonth category     Total
          <dbl> <yearmon> <chr>        <dbl>
1             0 Mar 2022  Education    -76.0
2             0 Mar 2022  Food        -268. 
3             0 Mar 2022  Recreation  -349. 
4             0 Mar 2022  Shelter    -1110. 
5             0 Mar 2022  Wage       11932. 
6             0 Apr 2022  Education    -38.0

In this code chunk below, we are going to create a new table called expenses to reveal the living cost of the residents by excluding Wage and Rent Adjustment from category column.

expenses <- financial %>%
  filter(category != 'Wage') %>%
  filter(category != 'RentAdjustment')

Next, we are going to use pivot_wider() of dplyr to create new columns based on the amount of each category, and rename the new table as fin_wide.

We will also strip all NA values by specifying na.rm = TRUE and calculate the living cost and revenue as follows.

Living cost = Education + Food + Recreation + Shelter

Revenue = Wage + Rent Adjustment

fin_wide <- financial %>%
  pivot_wider(names_from = "category",
              values_from = "Total") %>%
  rowwise() %>%
  mutate(LivingCost = sum(c_across(c(1:4)),na.rm=T)) %>%
  mutate(Revenue = sum(c_across(c(5:6)),na.rm=T))
head(fin_wide)
# A tibble: 6 x 10
# Rowwise:  participantId, yearmonth
  participantId yearmonth Education  Food Recreation Shelter   Wage
          <dbl> <yearmon>     <dbl> <dbl>      <dbl>   <dbl>  <dbl>
1             0 Mar 2022      -76.0 -268.      -349.  -1110. 11932.
2             0 Apr 2022      -38.0 -266.      -219.   -555.  8637.
3             0 May 2022      -38.0 -265.      -383.   -555.  9048.
4             0 Jun 2022      -38.0 -257.      -466.   -555.  9048.
5             0 Jul 2022      -38.0 -270.     -1070.   -555.  8637.
6             0 Aug 2022      -38.0 -262.      -314.   -555.  9459.
# ... with 3 more variables: RentAdjustment <dbl>, LivingCost <dbl>,
#   Revenue <dbl>

Total savings will be computed as revenue minus living cost. We will rename the new table as fin_summary and the new column as Savings.

fin_summary <- financial %>%
  group_by(participantId, yearmonth) %>%
  summarise(Savings = sum(Total))
head(fin_summary)
# A tibble: 6 x 3
# Groups:   participantId [1]
  participantId yearmonth Savings
          <dbl> <yearmon>   <dbl>
1             0 Mar 2022   10129.
2             0 Apr 2022    7559.
3             0 May 2022    7808.
4             0 Jun 2022    7733.
5             0 Jul 2022    6704.
6             0 Aug 2022    8290.

4. Economic Visualisation

Financial Health of Residents from March 2022 to May 2023

qplot(yearmonth, Savings, data = fin_summary) +
  labs(x = "Year & Month",
       y = "Savings") +
  theme_bw() +
  facet_trelliscope(~ participantId,
                    nrow = 2, ncol = 5,
                    width = 450,
                    path = "trellis/",
                    self_contained = TRUE)

Average Monthly Expenses

On average, residents allocate most of their monthly savings for shelter.

p <- ggplot(data = expenses,
       aes(x = category, y = Total*-1))+
  geom_boxplot() +
  xlab("Category") +
  ylab("Amount ($)") +
  ggtitle("Average Monthly Expenses for Each Resident")
ggplotly(p)

5. References

https://www.statology.org/sum-by-group-in-r/

https://dcl-wrangle.stanford.edu/pivot-advanced.html#wider-1